14 research outputs found

    Handling non-compositionality in multilingual CNLs

    Full text link
    In this paper, we describe methods for handling multilingual non-compositional constructions in the framework of GF. We specifically look at methods to detect and extract non-compositional phrases from parallel texts and propose methods to handle such constructions in GF grammars. We expect that the methods to handle non-compositional constructions will enrich CNLs by providing more flexibility in the design of controlled languages. We look at two specific use cases of non-compositional constructions: a general-purpose method to detect and extract multilingual multiword expressions and a procedure to identify nominal compounds in German. We evaluate our procedure for multiword expressions by performing a qualitative analysis of the results. For the experiments on nominal compounds, we incorporate the detected compounds in a full SMT pipeline and evaluate the impact of our method in machine translation process.Comment: CNL workshop in COLING 201

    Automatic conversion of colloquial Finnish to standard Finnish

    Get PDF
    Abstract This paper presents a rule-based method for converting between colloquial Finnish and standard Finnish. The method relies upon a small number of orthographical rules combined with a large language model of standard Finnish for ranking the possible conversions. Aside from this contribution, the paper also presents an evaluation corpus consisting of aligned sentences in colloquial Finnish, orthographically-standardised colloquial Finnish and standard Finnish. The method we present outperforms the baseline of simply treating colloquial Finnish as standard Finnish, but is outperformed by a phrase-based MT system trained by the evaluation corpus. The paper also presents preliminary results which show promise for using normalisation in the machine translation task

    An End-to-End Pipeline from Law Text to Logical Formulas

    Get PDF
    We propose a pipeline for converting natural English law texts into logical formulas via a series of structural representations. Text texts are first parsed using a formal grammar derived from light-weight annotations. An intermediate representation called assembly logic is then used for logical interpretation and supports translations to different back-end logics and visualisations. The approach, while rule-based and explainable, is also robust: it can deliver useful results from day one, but allows subsequent refinements and variations

    Constraint Grammar as a SAT problem

    Get PDF
    We represent Constraint Grammar (CG) as a Boolean satisfiability (SAT) problem. Encoding CG in logic brings some new features to the grammars. The rules are interpreted in a more declarative way, which makes it possible to abstract away from details such as cautious context and ordering. A rule is allowed to affect its context words, which makes the number of the rules in a grammar potentially smaller. Ordering can be preserved or discarded; in the latter case, we solve eventual rule conflicts by finding a solution that discards the least number of rule applications. We test our implementation by parsing texts in the order of 10,000s–100,000s words, using grammars with hundreds of rules

    Automatic test suite generation for PMCFG grammars

    Get PDF
    We present a method for finding errors in formalized natural language grammars, by automatically and systematically generating test cases that are intended to be judged by a human oracle. The method works on a per-construction basis; given a construction from the grammar, it generates a finite but complete set of test sentences (typically tens or hundreds), where that construction is used in all possible ways. Our method is an alternative to using a corpus or a treebank, where no such completeness guarantees can be made. The method is language-independent and is implemented for the grammar formalism PMCFG, but also works for weaker grammar formalisms. We evaluate the method on a number of different grammars for different natural languages, with sizes ranging from toy examples to real-world grammars

    Formal Methods for Testing Grammars

    Get PDF
    Grammar engineering has a lot in common with software engineering. Analogous to a program specification, we use descriptive grammar books; in place of unit tests, we have gold standard corpora and test cases for manual inspection. And just like any software, our grammars still contain bugs: grammatical sentences that are rejected, ungrammatical sentences that are parsed, or grammatical sentences that get the wrong parse. This thesis presents two contributions to the analysis and quality control of computational grammars of natural languages. Firstly, we present a method for finding contradictory grammar rules in Constraint Grammar, a robust and low-level formalism for part-of-speech tagging and shallow parsing. Secondly, we generate minimal and representative test suites of example sentences that cover all grammatical constructions in Grammatical Framework, a multilingual grammar formalism based on deep structural analysis

    Analysing constraint grammars with a SAT-solver

    No full text
    We describe a method for analysing Constraint Grammars (CG) that can detect internal conflicts and redundancies in a given grammar, without the need for a corpus. The aim is for grammar writers to be able to automatically diagnose, and then manually improve their grammars. Our method works by translating the given grammar into logical constraints that are analysed by a SAT-solver. We have evaluated our analysis on a number of non-trivial grammars and found inconsistencies

    Automatic conversion of colloquial Finnish to standard Finnish

    No full text
    This paper presents a rule-based method for converting between colloquial Finnish and standard Finnish. The method relies upon a small number of orthographical rules combined with a large language model of standard Finnish for ranking the possible conversions. Aside from this contribution, the paper also presents an evaluation corpus consisting of aligned sentences in colloquial Finnish, orthographically-standardised colloquial Finnish and standard Finnish. The method we present outperforms the baseline of simply treating colloquial Finnish as standard Finnish, but is outperformed by a phrase-based MT system trained by the evaluation corpus. The paper also presents preliminary results which show promise for using normalisation in the machine translation task
    corecore